Mining and Ranking Biomedical Synonym Candidates from Wikipedia
نویسندگان
چکیده
Biomedical synonyms are important resources for Natural Language Processing in Biomedical domain. Existing synonym resources (e.g., the UMLS) are not complete. Manual efforts for expanding and enriching these resources are prohibitively expensive. We therefore develop and evaluate approaches for automated synonym extraction from Wikipedia. Using the inter-wiki links, we extracted the candidate synonyms (anchor-text e.g., “increased thirst”) in a Wikipedia page and the title (e.g., “polyuria”) of its corresponding linked page. We rank synonym candidates with word embedding and pseudo-relevance feedback (PRF). Our results show that PRF-based reranking outperformed word embedding based approach and a strong baseline using interwiki link frequency. A hybrid method, Rank Score Combination, achieved the best results. Our analysis also suggests that medical synonyms mined from Wikipedia can increase the coverage of existing synonym resources such
منابع مشابه
Exploiting BabelNet for Multilingual Biomedical Synonym Expansion
Our challenge contribution for CLEF-‐ER consists in providing annotations for all three corpora of the challenge (Medline, EMEA, Patents) for the languages French and German. The objective of these experiments is to verify whether a general multilingual ontological resource as BabelNet (http://babelnet.org) can be used to substantially enrich the terminology provided by the challenge organizer...
متن کاملUnderstanding the Query: THCIB and THUIS at NTCIR-10 Intent Task
Understanding intent underlying search query recently attracted enormous research interests. Two challenging issues are worth noting: First, words within query are usually ambiguous while query in most cases is too short to disambiguate. Second, ambiguity in some cases cannot be resolved according merely to the limited query context. It is thus demanded that the ambiguity be resolved/analyzed w...
متن کاملHIT2 Joint NLP Lab at the NTCIR-9 Intent Task
The report hereby is to represent the principle, the searching process and experiment results. We report our systems and experiments in the intent task of NTCIR 9. The research aims at evaluating the effectiveness of the proposed methods on query intent mining and results diversification in terms of web search. In the subtopic mining subtask, we combine the extracted candidates from search logs...
متن کاملComparative Evaluation of Link-Based Approaches for Candidate Ranking in Link-to-Wikipedia Systems
In recent years, the task of automatically linking pieces of text (anchors) mentioned in a document to Wikipedia articles that represent the meaning of these anchors has received extensive research attention. Typically, link-to-Wikipedia systems try to find a set of Wikipedia articles that are candidates to represent the meaning of the anchor and, later, rank these candidates to select the most...
متن کاملRanking relations between diseases, drugs and genes for a curation task
BACKGROUND One of the key pieces of information which biomedical text mining systems are expected to extract from the literature are interactions among different types of biomedical entities (proteins, genes, diseases, drugs, etc.). Several large resources of curated relations between biomedical entities are currently available, such as the Pharmacogenomics Knowledge Base (PharmGKB) or the Comp...
متن کامل